AITopics | ordinal variable

Collaborating Authors

ordinal variable

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TabSODA: Tabular Diffusion based Imputation with Skip Pattern Detection and Ordinal Awareness

Chen, Yuyu, Kim, Taehyo, Shu, Hai, Feng, Yang

arXiv.org Machine LearningJun-5-2026

Missing data imputation in large-scale surveys faces two challenges that are not well handled by current tabular diffusion methods. First, \emph{structural skips}, cells made inapplicable by questionnaire design, should not be imputed but are often conflated with item nonresponse. Second, \emph{ordinal} responses encode ordered categories, yet most pipelines treat them as nominal levels through one-hot or analog-bit encodings. We introduce \textbf{TabSODA} (\textbf{Tab}ular diffusion with \textbf{S}kip pattern detection and \textbf{O}r\textbf{d}inal \textbf{A}wareness), an Expectation-Maximization (EM)-based diffusion imputer built on the Elucidated Diffusion Model (EDM) framework. TabSODA propagates structural skips through the denoising loss and reverse-time sampler, and represents ordinal variables with cumulative-probit scalar latents while retaining analog-bit encodings for nominal variables. When a codebook skip mask is available, TabSODA uses it directly; otherwise, the TabSODA+SKIP variant estimates the mask from raw responses and questionnaire order using a CART-based skip-pattern miner. On Population Assessment of Tobacco and Health (PATH) study and the National Survey on Drug Use and Health (NSDUH), two nationally representative U.S.\ surveys, TabSODA reduces ordinal MACE by up to $23.7\%$ and improves categorical accuracy by up to $9\%$ over the strongest baseline across MCAR, MAR, and MNAR masking. The skip miner achieves near-perfect precision on both datasets, allowing TabSODA+SKIP to closely track the codebook-mask variant.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

2606.05361

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.87)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Bounce: Reliable High-Dimensional Bayesian Optimization for Combinatorial and Mixed Spaces

Neural Information Processing SystemsMay-1-2026, 01:32:43 GMT

Impactful applications such as materials discovery, hardware design, neural architecture search, or portfolio optimization require optimizing high-dimensional black-box functions with mixed and combinatorial input spaces. While Bayesian optimization has recently made significant progress in solving such problems, an in-depth analysis reveals that the current state-of-the-art methods are not reliable. Their performances degrade substantially when the unknown optima of the function do not have a certain structure. To fill the need for a reliable algorithm for combinatorial and mixed spaces, this paper proposes Bounce that relies on a novel map of various variable types into nested embeddings of increasing dimensionality. Comprehensive experiments show that Bounce reliably achieves and often even improves upon state-of-the-art performance on a variety of high-dimensional problems.

artificial intelligence, benchmark, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization

Doumont, Colin, Picheny, Victor, Borovitskiy, Viacheslav, Moss, Henry

arXiv.org Artificial IntelligenceOct-31-2025

Bayesian Optimization (BO) has the potential to solve various combinatorial tasks, ranging from materials science to neural architecture search. However, BO requires specialized kernels to effectively model combinatorial domains. Recent efforts have introduced several combinatorial kernels, but the relationships among them are not well understood. To bridge this gap, we develop a unifying framework based on heat kernels, which we derive in a systematic way and express as simple closed-form expressions. Using this framework, we prove that many successful combinatorial kernels are either related or equivalent to heat kernels, and validate this theoretical claim in our experiments. Moreover, our analysis confirms and extends the results presented in Bounce: certain algorithms' performance decreases substantially when the unknown optima of the function do not have a certain structure. In contrast, heat kernels are not sensitive to the location of the optima. Lastly, we show that a fast and simple pipeline, relying on heat kernels, is able to achieve state-of-the-art results, matching or even outperforming certain slow or complex algorithms.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.26633

Country: Europe > United Kingdom > England (0.28)

Genre:

Overview (0.67)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Latent Causal Inference Framework for Ordinal Variables

Scauda, Martina, Kuipers, Jack, Moffa, Giusi

arXiv.org Machine LearningFeb-14-2025

Ordinal variables, such as on the Likert scale, are common in applied research. Yet, existing methods for causal inference tend to target nominal or continuous data. When applied to ordinal data, this fails to account for the inherent ordering or imposes well-defined relative magnitudes. Hence, there is a need for specialised methods to compute interventional effects between ordinal variables while accounting for their ordinality. One potential framework is to presume a latent Gaussian Directed Acyclic Graph (DAG) model: that the ordinal variables originate from marginally discretizing a set of Gaussian variables whose latent covariance matrix is constrained to satisfy the conditional independencies inherent in a DAG. Conditioned on a given latent covariance matrix and discretisation thresholds, we derive a closed-form function for ordinal causal effects in terms of interventional distributions in the latent space. Our causal estimation combines naturally with algorithms to learn the latent DAG and its parameters, like the Ordinal Structural EM algorithm. Simulations demonstrate the applicability of the proposed approach in estimating ordinal causal effects both for known and unknown structures of the latent graph. As an illustration of a real-world use case, the method is applied to survey data of 408 patients from a study on the functional relationships between symptoms of obsessive-compulsive disorder and depression.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Machine Learning

2502.10276

Country:

Europe > Switzerland (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types

Christoffersen, Benjamin, Clements, Mark, Humphreys, Keith, Kjellström, Hedvig

arXiv.org Machine LearningFeb-4-2021

Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.

approximation, multinomial variable, zhao & udell, (12 more...)

arXiv.org Machine Learning

2102.02642

Country:

Europe > Austria > Vienna (0.14)
Europe > Sweden (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning a binary search with a recurrent neural network. A novel approach to ordinal regression analysis

Falissard, Louis, Bounebache, Karim, Rey, Grégoire

arXiv.org Machine LearningJan-7-2021

Deep neural networks are a family of computational models that are naturally suited to the analysis of hierarchical data such as, for instance, sequential data with the use of recurrent neural networks. In the other hand, ordinal regression is a well-known predictive modelling problem used in fields as diverse as psychometry to deep neural network based voice modelling. Their specificity lies in the properties of their outcome variable, typically considered as a categorical variable with natural ordering properties, typically allowing comparisons between different states ("a little" is less than "somewhat" which is itself less than "a lot", with transitivity allowed). This article investigates the application of sequence-to-sequence learning methods provided by the deep learning framework in ordinal regression, by formulating the ordinal regression problem as a sequential binary search. A method for visualizing the model's explanatory variables according to the ordinal target variable is proposed, that bears some similarities to linear discriminant analysis. The method is compared to traditional ordinal regression methods on a number of benchmark dataset, and is shown to have comparable or significantly better predictive power.

dataset, neural network, recurrent neural network, (16 more...)

arXiv.org Machine Learning

2101.02609

Country:

North America > United States > California (0.05)
Europe > France (0.05)
North America > United States > Wisconsin (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Guide to Encoding Categorical Features Using Scikit-Learn For Machine Learning

#artificialintelligenceDec-11-2020, 03:40:19 GMT

One of the most crucial preprocessing steps in any machine learning project is feature encoding. It is the process of turning categorical data in a dataset into numerical data. It is essential that we perform feature encoding because most machine learning models can only interpret numerical data and not data in text form. As usual, I will demonstrate these concepts through a practical case study using the students' performance in exams dataset on Kaggle. You can find the complete notebook up on my GitHub here.

dataset, ordinal variable, student, (15 more...)

#artificialintelligence

Industry: Education > Assessment & Standards > Student Performance (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Beyond One-Hot: an exploration of categorical variables

#artificialintelligenceAug-14-2016, 22:45:27 GMT

In machine learning, data are king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let's look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a fixed number of possible values, rather than a continuous number. Each value assigns the measurement to one of those finite groups, or categories. They differ from ordinal variables in that the distance from one category to another ought to be equal regardless of the number of categories, as opposed to ordinal variables which have some intrinsic ordering.

artificial intelligence, dataset, machine learning, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Ordinal regression - Wikipedia, the free encyclopedia

#artificialintelligenceMay-10-2016, 21:40:21 GMT

In statistics, ordinal regression (also called "ordinal classification") is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem in between (metric) regression and classification.[1] Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning.[2][a] Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset.

artificial intelligence, machine learning, regression, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Add feedback

Metric Learning for Ordinal Data

Shi, Yuan (University of Southern California) | Li, Wenzhe (University of Southern California) | Sha, Fei (University of Southern California)

AAAI ConferencesApr-19-2016

A large amount of ordinal-valued data exist in many domains, including medical and health science, social science, economics, political science, etc. Unlike image and speech datasets of real-valued data, learning with ordinal variables (i.e., features) presents unique challenges. In particular, the nominal differences between those feature values, which are just ranks, do not necessarily correspond to the real distances between the corresponding categories. Given their wide existence, it is imperative to develop machine learning algorithms that specifically address the need to model and infer with such data. In this paper, we present a novel metric learning algorithm that takes into consideration the nature of ordinal data. Our approach treats ordinal values as latent variables in intervals. Our algorithm then learns what those intervals are as well as distance metrics to measure distances between latent variables in those intervals. We derive the corresponding optimization algorithm and demonstrate how that can be solved effectively. Experimental results show that the proposed approach significantly improves baselines that do not explicitly model ordinal features.

artificial intelligence, bayesian inference, machine learning, (17 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Industry: Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback